Generating captions without looking beyond objects

نویسندگان

  • Hendrik Heuer
  • Christof Monz
  • Arnold W. M. Smeulders
چکیده

This paper explores new evaluation perspectives for image captioning and introduces a noun translation task that achieves comparative image caption generation performance by translating from a set of nouns to captions. This implies that in image captioning, all word categories other than nouns can be evoked by a powerful language model without sacrificing performance on n-gram precision. The paper also investigates lower and upper bounds of how much individual word categories in the captions contribute to the final BLEU score. A large possible improvement exists for nouns, verbs, and prepositions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The effects of captioning texts and caption ordering on L2 listening comprehension and vocabulary learning

This study investigated the effects of captioned texts on second/foreign (L2) listening comprehension and vocabulary gains using a computer multimedia program. Additionally, it explored the caption ordering effect (i.e. captions displayed during the first or second listening), and the interaction of captioning order with the L2 proficiency level of language learners in listening comprehension a...

متن کامل

Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data

The aim of image captioning is to generate similar captions by machine as human do to describe image contents. Despite many efforts, generating discriminative captions for images remains non-trivial. Most traditional approaches imitate the language structure patterns, thus tend to fall into a stereotype of replicating frequent phrases or sentences and neglect unique aspects of each image. In th...

متن کامل

Improving Accessibility of Transaction-centric Web Objects

Advances in web technology have considerably widened the Web accessibility divide between sighted and blind users. This divide is especially acute when conducting online transactions, e.g., shopping, paying bills, making travel plans, etc. Such transactions span multiple web pages and require that users find clickable objects (e.g., “add-to-cart” button) which are essential for transaction prog...

متن کامل

STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset

In recent years, automatic generation of image descriptions (captions), that is, image captioning, has attracted a great deal of attention. In this paper, we particularly consider generating Japanese captions for images. Since most available caption datasets have been constructed for English language, there are few datasets for Japanese. To tackle this problem, we construct a large-scale Japane...

متن کامل

Looking Beyond Text: Extracting Figures, Tables and Captions from Computer Science Papers

Identifying and extracting figures and tables along with their captions from scholarly articles is important both as a way of providing tools for article summarization, and as part of larger systems that seek to gain deeper, semantic understanding of these articles. While many “off-the-shelf” tools exist that can extract embedded images from these documents, e.g. PDFBox, Poppler, etc., these to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1610.03708  شماره 

صفحات  -

تاریخ انتشار 2016